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BINAURAL SIGNAL PROCESSING TECHNIQUES 



CROSS-REFERENCEJ0^RELATED APPLICATIONS 
This application is a contimjation-in-pait of commonly owned, co-pending United 
States Patent Application Serf£lNo. 08/666,757, filed on June 19, 1996 to Feng et al., and 
entitled BINAURAJ^StGNAL PROCESSING SYSTEM AND METHOD. 

BACKGROUND OF THE INVENTION 
The present invention is directed to the processing of acoustic signals, and more 
particularly, but not exclusively, relates to the localization and extraction of acoustic signals 
emanating from different sources. 

The difficulty of extracting a desired signal in the presence of interfering signals is a 
long-standing problem confronted by acoustic engineers. This problem impacts the design 
and construction of many kinds of devices such as systems for voice recognition and 
intelligence gathering. Especially troublesome is the separation of desired sound from 
unwanted sound with hearing aid devices. Generally, hearing aid devices do not permit 
selective amplification of a desired sound when contaminated by noise from a nearby source - 
- particularly when the noise is more intense. This problem is even more severe when the 
desired sound is a speech signal and the nearby noise is also a speech signal produced by 
multiple talkers (e.g. babble). As used herein, "noise" refers to random or nondeterministic 
signals and alternatively or additionally refers to any undesired signals and/or any signals 
interfering with the perception of a desired signal. 

One attempted solution to this problem has been the application of a single, highly 
directional microphone to enhance directionality of the hearing aid receiver. This approach 
has only a very limited capability. As a result, spectral subtraction, comb filtering, and 
speech-production modeling have been explored to enhance single microphone performance. 
Nonetheless, these approaches still generally fail to improve intelligibility of a desired speech 
signal, particularly when the signal and noise sources are in close proximity. 

Another approach has been to arrange a number of microphones in a selected spatial 
relationship to form a type of directional detection beam. Unfortunately, when limited to a 
size practical for hearing aids, beam forming arrays also have limited capacity to separate 
signals that are close together especially if the noise is more intense than the desired speech 

1 



22010-148/67767 




signal. In addition, in the case of one noise source in a less reverberant environment, the 
noise cancellation provided by the beam- former varies with the location of the noise source in 
relation to the microphone array. R.W. Stadler and W.M. Rabinowitz, On the Potential of 
Fixed Arrays for Hearing Aids , 94 Journal Acoustical Society of America 1332 (September 
5 1993), and W. Soede et aL, Development of a Directional Hearing Instrument Based on Array 
Technology , 94 Journal of Acoustical Society of America 785 (August 1993) are cited as 
additional background concerning the beamforming approach. 

Still another approach has been the application of two microphones displaced from one 
another to provide two signals to emulate certain aspects of the binaural hearing system 
10 common to humans and many types of animals. Although certain aspects of biologic 
p binaural hearing are not fully understood, it is believed that the ability to localize sound 
!j I sources is based on evaluation by the auditory system of binaural time delays and sound 
wjj levels across different frequency bands associated with each of the two sound signals. The 
jlj localization of sound sources with systems based on these interaural time and intensity 
£ jl5 differences is discussed in W. Lindemann, Extension of a Binaural Cross-Correlation Model 
;^ by Contralateral Inhibition - I. Simulation of Lateralization for Stationary Signals. 80 Journal 
yj of the Acoustical Society of America 1608 (December 1986). 

j , s The localization of multiple acoustic sources based on input from two microphones 

0 presents several significant challenges, as does the separation of a desired signal once the 
20 sound sources are localized. For example, the system set forth in Markus Bodden, Modeling 
Human Sound-Source Localization and the Cocktail-Party-Effect . 1 Acta Acustica 43 
(February/April 1993) employs a Wiener filter including a windowing process in an attempt 
to derive a desired signal from binaural input signals once the location of the desired signal 
has been established. Unfortunately, this approach results in significant deterioration of 
25 desired speech fidelity. Also, the system has only been demonstrated to suppress noise of 

equal intensity to the desired signal at an azimuthal separation of at least 30 degrees. A more 
intense noise emanating from a source spaced closer than 30 degrees from the desired source 
continues to present a problem. Moreover, the proposed algorithm of the Bodden system is 
computationally intense — posing a serious question of whether it can be practically 
30 embodied in a hearing aid device. 

Another example of a two microphone system is found in D. Banks, Localisation and 
Separation of Simultaneous Voices with Two Microphones. IEE Proceedings-I, 140 (1993). 
This system employs a windowing technique to estimate the location of a sound source when 
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there are nonoverlapping gaps in its spectrum compared to the spectrum of interfering noise. 
This system cannot perform localization when wide-band signals lacking such gaps are 
involved. In addition, the Banks article fails to provide details of the algorithm for 
reconstructing the desired signal. U.S. Patent Nos. 5,479,522 to Lindemann et al.; 5,325,436 
to Soli et al.; 5,289,544 to Franklin; and 4,773,095 to Zwicker et al. are cited as sources of 
additional background concerning dual microphone hearing aid systems. 

Effective localization is also often hampered by ambiguous positional information that 
results above certain frequencies related to the spacing of the input microphones. This 
problem was recognized in Stem, R. M., Zeiberg, A. S., and Trahiotis, C. "Lateralization of 
complex binaural stimuli: A weighted-image model," J. Acoust. Soc. Am. 84, 156-165 
(1988). 

Thus, a need remains for more effective localization and extraction techniques — 
especially for use with binaural systems. The present invention meets these needs and offers 
other significant benefits and advantages. 
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SUMMARY OF THE INVENTION 



The present invention relates to the processing of acoustic signals. Various aspects of 
5 the invention are novel, nonobvious, and provide various advantages. While the actual nature 
of the invention covered herein can only be determined with reference to the claims appended 
hereto, selected forms and features of the preferred embodiments as disclosed herein are 
described briefly as follows. 

One form of the present invention includes a unique signal processing technique for 
10 localizing and characterizing each of a number of differently located acoustic sources. This 
form may include two spaced apart sensors to detect acoustic output from the sources. Each, 
or one particular selected source may be extracted, while suppressing the output of the other 

ill 

OS sources. A variety of applications may benefit from this technique including hearing aids, 
CI 

yj sound location mapping or tracking devices, and voice recognition equipment, to name a few. 
!15 In another form, a first signal is provided from a first acoustic sensor and a second 

UJ signal from a second acoustic sensor spaced apart from the first acoustic sensor. The first and 
£ I second signals each correspond to a composite of two or more acoustic sources that, in turn, 

| include a plurality of interfering sources and a desired source. The interfering sources are 
UJ localized by processing of the first and second signals to provide a corresponding number of 
go interfering source signals. These signals each include a number of frequency components. 
One or more the frequency components are suppressed for each of the interfering source 
signals. This approach facilitates nulling a different frequency component for each of a 
number of noise sources with two input sensors. 

A further form of the present invention is a processing system having a pair of sensors 
25 and a delay operator responsive to a pair of input signals from the sensors to generate a 

number of delayed signals therefrom. The system also has a localization operator responsive 
to the delayed signals to localize the interfering sources relative to the location of the sensors 
and provide a plurality of interfering source signals each represented by a number of 
frequency components. The system further includes an extraction operator that serves to 
30 suppress selected frequency components for each of the interfering source signals and extract 
a desired signal corresponding to a desired source. An output device responsive to the 
desired signal is also included that provides an output representative of the desired source. 
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This system may be incorporated into a signal processor coupled to the sensors to facilitate 
localizing and suppressing multiple noise sources when extracting a desired signal. 

Still another form is responsive to position-plus-frequency attributes of sound sources. 
It includes positioning a first acoustic sensor and a second acoustic sensor to detect a plurality 
of differently located acoustic sources. First and second signals are generated by the first and 
second sensors, respectively, that receive stimuli from the acoustic sources. A number of 
delayed signal pairs are provided from the first and second signals that each correspond to 
one of a number of positions relative to the first and second sensors. The sources are 
localized as a function of the delayed signal pairs and a number of coincidence patterns. 
These patterns are position and frequency specific, and may be utilized to recognize and 
correspondingly accumulate position data estimates that map to each true source position. As 
a result, these patterns may operate as filters to provide better localization resolution and 
eliminate spurious data. 

In yet another form, a system includes two sensors each configured to generate a 
corresponding first or second input signal and a delay operator responsive to these signals to 
generate a number of delayed signals each corresponding to one of a number of positions 
relative to the sensors. The system also includes a localization operator responsive to the 
delayed signals for determining the number of sound source localization signals. These 
localization signals are determined from the delayed signals and a number of coincidence 
patterns that each correspond to one of the positions. The patterns each relate frequency 
varying sound source location information caused by ambiguous phase multiples to a 
corresponding position to improve acoustic source localization. The system also has an 
output device responsive to the localization signals to provide an output corresponding to at 
least one of the sources. 

A further form utilizes two sensors to provide corresponding binaural signals from 
which the relative separation of a first acoustic source from a second acoustic source may be 
established as a function of time, and the spectral content of a desired acoustic signal from 
the first source may be representatively extracted. Localization and identification of the 
spectral content of the desired acoustic signal may be performed concurrently. This form 
may also successfully extract the desired acoustic signal even if a nearby noise source is of 
greater relative intensity. 

Another form of the present invention employs a first and second sensor at different 
locations to provide a binaural representation of an acoustic signal which includes a desired 
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signal emanating from selected source and interfering signals emanating from several 
interfering sources. A processor generates a discrete first spectral signal and a discrete 
second spectral signal from the sensor signals. The processor delays the first and second 
spectral signals by a number of time intervals to generate a number of delayed first signals 
5 and a number of delayed second signals and provide a time increment signal. The time 

increment signal corresponds to separation of the selected source from the noise source. The 
processor generates an output signal as a function of the time increment signal, and an output 
device responds to the output signal to provide an output representative of the desired signal. 
An additional form includes positioning a first and second sensor relative to a first 
10 signal source with the first and second sensor being spaced apart from each other and a 
second signal source being spaced apart from the first signal source. A first signal is 
<~l provided from the first sensor and a second signal is provided from the second sensor. The 
o! first and second signals each represents a composite acoustic signal including a desired signal 
;1| from the first signal source and unwanted signals from other sound sources. A number of 
1**15 spectral signals are established from the first and second signals as functions of a number of 
y I frequencies. A member of the spectral signals representative of position of the second signal 
p source is determined, and an output signal is generated from the member which is 
l f* representative of the first signal source. This feature facilitates extraction of a desired signal 
bj from a spectral signal determined as part of the localization of the interfering source. This 
j"20 approach can avoid the extensive post-localization computations required by many binaural 
systems to extract a desired signal. 

Accordingly, it is one object of the present invention to provide for the enhanced 
localization of multiple acoustic sources. 

It is another object to extract a desired acoustic signal from a noisy environment caused 
25 by a number of interfering sources. 

An additional object is to provide a system for the localization and extraction of 
acoustic signals by detecting a combination of these signals with two differently located 
sensors. 

Further embodiments, objects, features, aspects, benefits, forms, and advantages of the 
30 present invention shall become apparent from the detailed drawings and descriptions 
provided herein. 
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*RIEF DESCRIPTION OF THE DRAWINGS 



FIG. 1 is a diagrammatic view of a system of one embodiment of the present invention. 
FIG. 2 is a signal flow diagram further depicting selected aspects of the system of FIG. 

1. 

FIG. 3 is schematic representation of the dual delay line of FIG. 2. 

FIGS. 4 A and 4B depict other embodiments of the present invention corresponding to 
hearing aid and computer voice recognition applications, respectively. 

FIG. 5 is a graph of a speech signal in the form of a sentence about 2 seconds long. 

FIG. 6 is a graph of a composite signal including babble noise and the speech signal of 
FIG. 5 at a 0 dB signal-to-noise ratio with the babble noise source at about a 60 azimuth 
relative to the speech signal source. 

FIG. 7 is a graph of a signal representative of the speech signal of FIG. 5 after 
extraction from the composite signal of FIG. 6. 

FIG. 8 is a graph of a composite signal including babble noise and the speech signal of 
FIG. 5 at a -30 dB signal-to-noise ratio with the babble noise source at a 2 degree azimuth 
relative to the speech signal source. 

FIG. 9 is a graphic depiction of a signal representative of the sample speech signal of 
FIG. 5 after extraction from the composite signal of FIG. 8. 

FIG. 10 is a signal flow diagram of another embodiment of the present invention. 

FIG. 1 1 is a partial, signal flow diagram illustrating selected aspects of the dual delay 
lines of FIG. 10 in greater detail. 

FIG. 12 is a diagram illustrating selected geometric features of the embodiment 
illustrated in FIG. 10 for a representative example of one of a number of sound sources. 

FIG. 13 is a signal flow diagram illustrating selected aspects of the localization operator 
of FIG. 10 in greater detail. 

FIG. 14 is a diagram illustrating yet another embodiment of the present invention. 

FIG. 15 is a signal flow diagram further illustrating selected aspects of the embodiment 
of FIG. 14. 

FIG. 16 is a signal flow diagram illustrating selected aspects of the localization operator 
of FIG. 15 in greater detail. 

FIG. 17 is a graph illustrating a plot of coincidence loci for two sources. 
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FIG. 1 8 is a graph illustrating coincidence patterns for azimuth positions corresponding 
to -75°, 0°, 20°, and 75°. 

FIGs. 19-22 are tables depicting experimental results obtained with the present 
invention. 
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DESCRIPTION OF THE SELECTED EMBODIMENTS 



For the purposes of promoting an understanding of the principles of the invention, 
reference will now be made to the embodiment illustrated in the drawings and specific 
language will be used to describe the same. It will nevertheless be understood that no 
limitation of the scope of the invention is thereby intended. Any alterations and further 
modifications in the described embodiments, and any further applications of the principles of 
the invention as described herein are contemplated as would normally occur to one skilled in 
the art to which the invention relates. 

Fig. 1 illustrates an acoustic signal processing system 10 of one embodiment of the 
present invention. System 10 is configured to extract a desired acoustic signal from source 
12 despite interference or noise emanating from nearby source 14. System 10 includes a pair 
of acoustic sensors 22, 24 configured to detect acoustic excitation that includes signals from 
sources 12, 14. Sensors 22, 24 are operatively coupled to processor 30 to process signals 
received therefrom. Also, processor 30 is operatively coupled to output device 90 to provide 
a signal representative of a desired signal from source 12 with reduced interference from 
source 14 as compared to composite acoustic signals presented to sensors 22, 24 from sources 
12, 14. 

Sensors 22, 24 are spaced apart from one another by distance D along lateral axis T. 
Midpoint M represents the halfway point along distance D from sensor 22 to sensor 24. 
Reference axis Rl is aligned with source 12 and intersects axis T perpendicularly through 
midpoint M. Axis N is aligned with source 14 and also intersects midpoint M. Axis N is 
positioned to form angle A with reference axis Rl. Fig. 1 depicts an angle A of about 20 
degrees. Notably, reference axis Rl may be selected to define a reference azimuthal position 
of zero degrees in an azimuthal plane intersecting sources 12, 14; sensors 22, 24; and 
containing axes T, N, Rl. As a result, source 12 is "on-axis" and source 14, as aligned with 
axis N, is "off-axis." Source 14 is illustrated at about a 20 degree azimuth relative to source 
12. 

Preferably sensors 22, 24 are fixed relative to each other and configured to move in 
tandem to selectively position reference axis Rl relative to a desired acoustic signal source. 
It is also preferred that sensors 22, 24 be microphones of a conventional variety, such as 
omnidirectional dynamic microphones. In other embodiments, a different sensor type may be 
utilized as would occur to one skilled in the art. 
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Referring additionally to FIG. 2, a signal flow diagram illustrates various processing 
stages for the embodiment shown in FIG. 1 . Sensors 22, 24 provide analog signals Lp(t) and 
Rp(t) corresponding to the left sensor 22, and right sensor 24, respectively. Signals Lp(t) and 
Rp(t) are initially input to processor 30 in separate processing channels L and R. For each 
channel L, R, signals Lp(t) and Rp(t) are conditioned and filtered in stages 32a, 32b to reduce 
aliasing, respectively. After filter stages 32a, 32b, the conditioned signals Lp(t), Rp(t) are 
input to corresponding Analog to Digital (A/D) converters 34a, 34b to provide discrete 
signals Lp(k), Rp(k), where k indexes discrete sampling events. In one embodiment, A/D 
stages 34a, 34b sample signals Lp(t) and Rp(t) at a rate of at least twice the frequency of the 
upper end of the audio frequency range to assure a high fidelity representation of the input 
signals. 

Discrete signals Lp(k) and Rp(k) are transformed from the time domain to the 
frequency domain by a short-term Discrete Fourier Transform (DFT) algorithm in stages 36a, 
36b to provide complex-valued signals XLp(m) and XRp(m). Signals XLp(m) and XRp(m) 
are evaluated in stages 36a, 36b at discrete frequencies^, where m is an index (m=l to 

m=M) to discrete frequencies, and index p denotes the short-term spectral analysis time 
frame. Index p is arranged in reverse chronological order with the most recent time frame 
being p =1 , the next most recent time frame being p = 2, and so forth. Preferably, frequencies 
M encompass the audible frequency range and the number of samples employed in the short- 
term analysis is selected to strike an optimum balance between processing speed limitations 
and desired resolution of resulting output signals. In one embodiment, an audio range of 0.1 
to 6 kHz is sampled in A/D stages 34a, 34b at a rate of at least 12.5 kHz with 512 samples per 
short-term spectral analysis time frame. In alternative embodiments, the frequency domain 
analysis may be provided by an analog filter bank employed before A/D stages 34a, 34b. It 
should be understood that the spectral signals XLp(m) and XRp(m) may be represented as 
arrays each having a lxM dimension corresponding to the different frequencies^^. 

Spectral signals XLp(m) and XRp(m) are input to dual delay line 40 as 
further detailed in FIG. 3. FIG. 3 depicts two delay lines 42, 44 each having N number of 
delay stages. Each delay line 42, 44 is sequentially configured with delay stages D\ through 

Djsj. Delay lines 42, 44 are configured to delay corresponding input signals in opposing 
directions from one delay stage to the next, and generally correspond to the dual hearing 
channels associated with a natural binaural hearing process. Delay stages Dj, D2, D3, . . 
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°N-1> an <* Dn each delay an input signal by corresponding time delay increments xj, 
x 3> • ■ •> T N-2> TN-l* ^ xjsj, (collectively designated tj ), where index i goes from left to 

right. For delay line 42, XLp(m) is alternatively designated XLp^m). XLp^m) is 
sequentially delayed by time delay increments tj, %2, X3, . . ., xjvj_2, tj^.j, and to produce 

5 delayed outputs at the taps of delay line 42 which are respectively designated XLp2(m), 

XLp3(m), Xlp 4 (m), . . XLpN-l( m ), XLpN( m ), and XLpN+l( m ) ; (collectively designated 
XLp\m)). For delay line 44, XRp(m) is alternatively designated XRp N+1 (m). XRp N +l(m) 
is sequentially delayed by time delay increments increments tj, X2, X3, . . TN-l* 
xjsj to produce delayed outputs at the taps of delay line 44 which are respectively designated: 

j= f0 XRp N (m), XRp N -l(m), XRp N " 2 (m), . . ., XLp 3 (m), XLp2( m ), and Xlp^m); (collectively 
designated XRp^m)). The input spectral signals and the signals from delay line 42, 44 taps 

\j} 

CI are arranged as input pairs to operation array 46. A pair of taps from delay lines 42, 44 is 

LJl 

illustrated as input pair P in FIG. 3. 
!yj Operation array 46 has operation units (OP) numbered from 1 to N+l, depicted as 

ns OP1, OP2, OP3, OP4,..., OPN-2, OPN-1, OPN, OPN+1 and collectively designated 
1]] operations OPi. Input pairs from delay lines 42, 44 correspond to the operations of array 46 
h as follows: OPI [XLpl(m), XRpl(m)], OP2[XLp2(m), XRp 2 (m)], OP3[XLp 3 (m), 
XRp3(m)], OP4[XLp4(m), XRp4(m)],..., OPN-2[XLp(N-2)( m ), XRp(N-2)(m)], 
OPN-l[XLp(N-l)(m), XRp(N-l)(m)], OPN[XLpN( m ), XRpN( m )], and 
20 OPN+1 [XLpCN+l^m), XRp(N+l)( m )] ; where OPitXLpXm), XRp'Cm)] indicates that OPi is 
determined as a function of input pair XLp'(m), XRp'(m). Correspondingly, the outputs of 
operation array 46 are Xp^m), Xp 2 (m), Xp3(m), Xp 4 (m), Xp(N- 2 )(m), XpCN-^Cm), 
XpN(m), and Xp(N+l )( m ) (collectively designated Xpi(m)). 

For i = 1 to i < N/2, operations for each OPi of array 46 are determined in 
25 accordance with complex expression 1 (CE1) as follows: 

XLpi(m) - XRpi(m) 

Xpi(m) = , 

exp[-j27c(T i +...+T N /2)/ m ] - exp[j27t(T(( N /2) + i)+...+T( N . i+1 ))/ m ] 

30 
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where exp[argument] represents a natural exponent to the power of the argument, and 
imaginary number j is the square root of -1. For i > ((N/2) +1) to i = N+l, operations of 
operation array 46 are determined in accordance complex expression 2 (CE2) as follows: 

XLpi(m) - XRpi(m) 

XpKm) = , 

^pU2k(x^/2)+\)^^ x(i. 1 ))/" rn ]-exp[-j27i(T( N .i H -2)- f - + WlVm] 

where exp[argument] represents a natural exponent to the power of the argument, and 
imaginary number j is the square root of -1. For i = (N/2)+l, neither CE1 nor CE2 is 
performed. 

An example of the determination of the operations for N = 4 (i=l to i=N+l) 
is as follows: 

i = 1, CE1 applies as follows: 

XLp!(m) -XRp^m) 

Xpl(m) = ; 

exp[-j27i(T 1 -fx2y m ] * expD27t(T3+T 4 )/i T1 ] 

i = 2 < (N/2), CE1 applies as follows: 

XLp 2 (m) - XRp 2 (m) 

Xp2(m) = ; 

exp[-j27c(T 2 y m ] - exp|j27t(T3)/* m ] 

i = 3: Not applicable, (N/2) < i < ((N/2)+l); 
i = 4, CE2 applies as follows: 
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XLp 4 (m) - XRp 4 (m) 



Xp4(m) = - 



; and, 



exp[j27i(T3)/* m ]-exp[-j27c(T2ym] 



i = 5, CE2 applies as follows: 



XLp5(m) - XRp5(m) 



Xp5(m) = 



expU27i(T3-Hr 4 }/* m ]-exp[-j27c(T 1 +T2)/in] 



Referring to FIGS. 1-3, each OPi of operation array 46 is defined to be representative 
of a different azimuthal position relative to reference axis R. The "center" operation, OPi 
where i = ((N/2)+l), represents the location of the reference axis and source 12. For the 
example N=4, this center operation corresponds to i = 3. This arrangement is analogous to 
the different interaural time differences associated with a natural binaural hearing system. In 
these natural systems, there is a relative position in each sound passageway within the ear that 
corresponds to a maximum "in phase" peak for a given sound source. Accordingly, each 
operation of array 46 represents a position corresponding to a potential azimuthal or angular 
position range for a sound source, with the center operation representing a source at the zero 
azimuth — a source aligned with reference axis R. For an environment having a single source 
without noise or interference, determining the signal pair with the maximum strength may be 
sufficient to locate the source with little additional processing; however, in noisy or multiple 
source environments, further processing may be needed to properly estimate locations. 

It should be understood that dual delay line 40 provides a two dimensional matrix of 
outputs with N+l columns corresponding to Xpi(m), and M rows corresponding to each 
discrete frequency f m of Xpi(m). This (N+l)xM matrix is determined for each short-term 
spectral analysis interval p. Furthermore, by subtracting XRp'(m) from XLp^m), the 
denominator of each expression CE1, CE2 is arranged to provide a minimum value of Xp*(m) 
when the signal pair is "in-phase" at the given frequency f m . Localization stage 70 uses this 
aspect of expressions CE1, CE2 to evaluate the location of source 14 relative to source 12. 
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Localization stage 70 accumulates P number of these matrices to determine the 
Xp^m) representative of the position of source 14. For each column i, localization stage 70 
performs a summation of the amplitude of |Xp^(m)| to the second power over frequencies^ 
from m=l to m=M. The summation is then multiplied by the inverse of M to find an average 
spectral energy as follows: 

M 

Xavgpi = (l/M) SIXpKm)! 2 . 
m=l 

The resulting averages, Xavgp 1 are then time averaged over the P most recent spectral- 
analysis time frames indexed by p in accordance with: 

P 

X 1 = E ypXavgp 1 , 
p=l 

where yp are empirically determined weighting factors. In one embodiment, the yp factors 
are preferably between 0.85 p and 0.90 15 , where p is the short-term spectral analysis time frame 
index. The X 1 are analyzed to determine the minimum value, min(X^). The index i of 

min(X 1 ), designated "I," estimates the column representing the azimuthal location of source 
14 relative to source 12. 

It has been discovered that the spectral content of a desired signal from source 12, 
when approximately aligned with reference axis Rl, can be estimated from Xp*(ni). In other 
words, the spectral signal output by array 46 which most closely corresponds to the relative 
location of the "off-axis" source 14 contemporaneously provides a spectral representation of 
a signal emanating from source 12. As a result, the signal processing of dual delay line 40 
not only facilitates localization of source 14, but also provides a spectral estimate of the 
desired signal with only minimal post-localization processing to produce a representative 
output. 

Post-localization processing includes provision of a designation signal by localization 
stage 70 to conceptual "switch" 80 to select the output column Xp^m) of the dual delay line 
40. The Xp^m) is routed by switch 80 to an inverse Discrete Fourier Transform algorithm 
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(Inverse DFT) in stage 82 for conversion from a frequency domain signal representation to a 
discrete time domain signal representation denoted as s(k). The signal estimate s(k) is then 
converted by Digital to Analog (D/A) converter 84 to provide an output signal to output 
device 90. 

Output device 90 amplifies the output signal from processor 30 with amplifier 92 and 
supplies the amplified signal to speaker 94 to provide the extracted signal from a source 12. 

It has been found that interference from off-axis sources separated by as little as 2 
degrees from the on axis source may be reduced or eliminated with the present invention — 
even when the desired signal includes speech and the interference includes babble. 
Moreover, the present invention provides for the extraction of desired signals even when the 
interfering or noise signal is of equal or greater relative intensity. By moving sensors 22, 24 
in tandem the signal selected to be extracted may correspondingly be changed. Moreover, the 
present invention may be employed in an environment having many sound sources in 
addition to sources 12, 14. In one alternative embodiment, the localization algorithm is 
configured to dynamically respond to relative positioning as well as relative strength, using 
automated learning techniques. In other embodiments, the present invention is adapted for 
use with highly directional microphones, more than two sensors to simultaneously extract 
multiple signals, and various adaptive amplification and filtering techniques known to those 
skilled in the art. 

The present invention greatly improves computational efficiency compared to 
conventional systems by determining a spectral signal representative of the desired signal as 
part of the localization processing. As a result, an output signal characteristic of a desired 

signal from source 12 is determined as a function of the signal pair XLpI(m), XRp*(m) 
corresponding to the separation of source 14 from source 12. Also, the exponents in the 
denominator of CE1,CE2 correspond to phase difference of frequencies f m resulting from 

the separation of source 12 from 14. Referring to the example of N=4 and assuming that 1=1, 
this phase difference is -27t(Tj + *2)fm (f OT delay li ne 42) and 2n(x2>+i4)f m (for delay line 44) 
and corresponds to the separation of the representative location of off-axis source 14 from the 
on-axis source 12 at i=3. Likewise the time increments, *\+T2 T 3+ T 4> correspond to the 
separation of source 14 from source 12 for this example. Thus, processor 30 implements 
dual delay line 40 and corresponding operational relationships CE1, CE2 to provide a means 
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for generating a desired signal by locating the position of an interfering signal source relative 
to the source of the desired signal. 

It is preferred that Tj be selected to provide generally equal azimuthal positions 
relative to reference axis R. In one embodiment, this arrangement corresponds to the values 
5 of tj changing about 20% from the smallest to the largest value. In other embodiments, x[ are 

all generally equal to one another, simplifying the operations of array 46. Notably, the pair of 
time increments in the numerator of CE1, CE2 corresponding to the separation of the sources 
12 and 14 become approximately equal when all values Tj are generally the same. 

Processor 30 may be comprised of one or more components or pieces of equipment. 
10 The processor may include digital circuits, analog circuits, or a combination of these circuit 
types. Processor 30 may be programmable, ah integrated state machine, or utilize a 

Ui 

;JJ combination of these techniques. Preferably, processor 30 is a solid state integrated digital 
\t\ signal processor circuit customized to perform the process of the present invention with a 
!^ ! minimum of external components and connections. Similarly, the extraction process of the 
US present invention may be performed on variously arranged processing equipment configured 

yj 

* * to provide the corresponding functionality with one or more hardware modules, firmware 

j B /j modules, software modules, or a combination thereof. Moreover, as used herein, "signal" 

y i 

M includes, but is not limited to, software, firmware, hardware, programming variable, 
J I communication channel, and memory location representations. 

56 Referring to FIG. 4 A, one application of the present invention is depicted as hearing 

aid system 110. System 1 10 includes eyeglasses G with microphones 122 and 124 fixed to 
glasses G and displaced from one another. Microphones 122, 124 are operatively coupled to 
hearing aid processor 130. Processor 130 is operatively coupled to output device 190. 
Output device 190 is positioned in ear E to provide an audio signal to the wearer. 

25 Microphones 122, 124 are utilized in a manner similar to sensors 22, 24 of the 

embodiment depicted by FIGS 1-3. Similarly, processor 130 is configured with the signal 
extraction process depicted in of FIGS. 1-3. Processor 130 provides the extracted signal to 
output device 190 to provide an audio output to the wearer. The wearer of system 110 may 
position glasses G to align with a desired sound source, such as a speech signal, to reduce 

30 interference from a nearby noise source off axis from the midpoint between microphones 
122, 124. Moreover, the wearer may select a different signal by realigning with another 
desired sound source to reduce interference from a noisy environment. 
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Processor 130 and output device 190 may be separate units (as depicted) or included 
in a common unit worn in the ear. The coupling between processor 1 30 and output device 
190 may be an electrical cable or a wireless transmission. In one alternative embodiment, 
sensors 122, 124 and processor 130 are remotely located and are configured to broadcast to 
one or more output devices 190 situated in the ear E via a radio frequency transmission or 
other conventional telecommunication method. 

FIG. 4B shows a voice recognition system 210 employing the present invention as a 
front end speech enhancement device. System 210 includes personal computer C with two 
microphones 222, 224 spaced apart from each other in a predetermined relationship. 
Microphones 222, 224 are operatively coupled to a processor 230 within computer C. 
Processor 230 provides an output signal for internal use or responsive reply via speakers 
294a, 294b or visual display 296. An operator aligns in a predetermined relationship with 
microphones 222, 224 of computer C to deliver voice commands. Computer C is configured 
to receive these voice commands, extracting the desired voice command from a noisy 
environment in accordance with the process system of FIGS. 1-3. 

Referring to Figs. 10-13, signal processing system 310 of another embodiment of the 
present invention is illustrated. Reference numerals of system 310 that are the same as those 
of system 10 refer to like features. The signal flow diagram of FIG. 10 corresponds to 
various signal processing techniques of system 310. Fig. 10 depicts left "Z," and right *7?" 
input channels for signal processor 330 of system 310. Channels L, R each include an 
acoustic sensor 22, 24 that provides an input signal x Ln (t), x Rn (t), respectively. Input signals 
xi n (t) and x Rn (t) correspond to composites of sounds from multiple acoustic sources located 
within the detection range of sensors 22, 24. As described in connection with FIG. 1 of 
system 10, it is preferred that sensors 22, 24 be standard microphones spaced apart from each 
other at a predetermined distance D. In other embodiments a different sensor type or 
arrangement may be employed as would occur to those skilled in the art. 

Sensors 22, 24 are operatively coupled to processor 330 of system 310 to provide input 
signals xt n (t) and x Rn (t) to A/D converters 34a, 34b. A/D converters 34a, 34b of processor 
330 convert input signals x Ln (t) and x Rrt (t) from an analog form to a discrete form as 
represented as xi„(k) and x Rn (k) y respectively; where is the familiar continuous time 
domain variable and is the familiar discrete sample index variable. A corresponding pair 
of preconditioning filters (not shown) may also be included in processor 330 as described in 
connection with system 10. 
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Digital Fourier Transform (DFT) stages 36a, 36b receive the digitized input signal pair 
xut(k) and x Rn (k) from converters 34a, 34b, respectively. Stages 36a, 36b transform input 
signals as x Ln (k) and XRn(k) into spectral signals designated XLn(m) and X Rn (m) using a short 
term discrete Fourier transform algorithm. Spectral signals Xt n (m) zndX Rn (m) are expressed 
in terms of a number of discrete frequency components indexed by integer m\ where m=l, 2, 

M. Also, as used herein, the subscripts L and R denote the left and right channels, 
respectively, and n indexes time frames for the discrete Fourier transform analysis. 

Delay operator 340 receives spectral signals Xi n (m) andXfotfm) from stages 36a, 36b, 
respectively. Delay operator 340 includes a number of dual delay lines (DDLs) 342 each 
corresponding to a different one of the component frequencies indexed by m. Thus, there are 
M different dual delay lines 342 utilized. However, only dual delay lines 342 corresponding 
to m=l and w=Mare shown in Fig. 10 to preserve clarity. The remaining dual delay lines 
corresponding to m-2 through m=(M-X) are represented by an ellipsis to preserve clarity. 
Alternatively, delay operator 340 may be described as a single dual delay line that 
simultaneously operates on M frequencies like dual delay line 40 of system 10. 

The pair of frequency components from DFT stages 36a, 36b corresponding to a given 
value of m are inputs into a corresponding one of dual delay lines 342. For the examples 
illustrated in Fig. 10, spectral signal component pair X Ln (m-l) and X Rn (m=l) is sent to the 
upper dual delay line 342 for the frequency corresponding to m=l ; and spectral signal 
component pair Xi„(m=M) and X Rn (m=M) is sent to the lower dual delay line 342 for the 
frequency corresponding to m=M. Likewise, common frequency component pairs of Xi n (m) 
zxi&Xunim) for frequencies corresponding to m=2 through m-(M-l) are each sent to a 
corresponding dual delay line as represented by ellipses to preserve clarity. 

Referring additionally to Fig. 1 1 , certain features of dual delay line 342 are further 
illustrated. Each dual delay line 342 includes a left channel delay line 342a receiving a 
corresponding frequency component input from DFT stage 36a and right channel delay line 
342b receiving a corresponding frequency component input from DFT stage 36b. Delay lines 
342a, 342b each include an odd number I of delay stages 344 indexed by f=l, 2, I. The I 
number of delayed signal pairs are provided on outputs 345 of delay stages 344 and are 
correspondingly sent to complex multipliers 346. There is one multiplier 346 corresponding 
to each delay stage 344 for each delay line 342a, 342b. Multipliers 346 provide equalization 
weighting for the corresponding outputs of delay stages 344. Each delayed signal pair from 
corresponding outputs 345 has one member from a delay stage 344 of left delay line 342a and 
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the other member from a delay stage 344 of right delay line 342b. Complex multipliers 346 
of each dual delay line 342 output corresponding products of the I number of delayed signal 
pairs along taps 347. The I number of signal pairs from taps 347 for each dual delay line 342 
of operator 340 are input to signal operator 350. 

For each dual delay line 342, the I number of pairs of multiplier taps 347 are each 
input to a different Operation Array (OA) 352 of operator 350. Each pair of taps 347 is 
provided to a different operation stage 354 within a corresponding operation array 352. In 
Fig. 11, only a portion of delay stages 344, multipliers 346, and operation stages 354 are 
shown corresponding to the two stages at either end of delay lines 342a, 342b and the middle 
stages of delay lines 342a, 342b. The intervening stages follow the pattern of the illustrated 
stages and are represented by ellipses to preserve clarity. 

For an arbitrary frequency co m , delay times x,- are given by equation (1) as follows: 



ITT) /_i K 

— mix • x * * *» . 

», — — u *Tr l '-2^ ' (1) 

where, / is the integer delay stage index in the range (/=1 , . . ., I); ITDmax — D/c is the 
maximum Intermicrophone Time Difference; D is the distance between sensors 22, 24; and c 
is the speed of sound. Further, delay times x, are antisymmetric with respect to the midpoint 
of the delay stages corresponding to /==(/+ 1)/2 as indicated in the following equation (2): 

— rro^ d-j+D-i K rro^ . i-i n 

+i — r" smC /-i *~2 ]= — r~ sul( 7Tr*- 2 } =_T «- (2) 

The azimuthal plane may be uniformly divided into I sectors with the azimuth position of 
each resulting sector being given by equation (3) as follows: 

0. =1—^180° -90°, f=l /. 

' (3) 

The azimuth positions in auditory space may be mapped to corresponding delayed signal 
pairs along each dual delay line 342 in accordance with equation (4) as follows: 
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rro^ { =i /. 

T ' 2 (4) 



The dual delay-line structure is similar to the embodiment of system 10, except that a 
different dual delay line is represented for each value of m and multipliers 346 have been 
included to multiply each corresponding delay stage 344 by an appropriate one of 
equalization factors a, (m); where i is the delay stage index previously described. Preferably, 
elements a, (m) are selected to compensate for differences in the noise intensity at sensors 22, 
24 as a function of both azimuth and frequency. 

One preferred embodiment for determining equalization factors a, (m) assumes 
amplitude compensation is independent of frequency, regarding any departure from this 
model as being negligible. For this embodiment, the amplitude of the received sound 
pressure | p | varies with the source-receiver distance r in accordance with equations (Al) 
and (A2) as follows: 

lpl«-, ( A1 > 



'PJ _ I*. (A2) 
'Pa' r L ' 



where | p £ | and | p* | are the amplitude of sound pressures at sensors 22, 24. Fig. 12 depicts 
sensors 22, 24 and a representative acoustic source SI within the range of reception to 
provide input signals x u (0 and XRn (t). According to the geometry illustrated in Fig. 12, the 
distances r L and r A from the source SI to the left and right sensors, respectively, are given by 
equations (A3) and (A4), as follows: 

r L = V(/sin$ + D/2) 2 +(/cos^) z ^^l^lDsmd, +D 2 /4, (A3) 
r R = V(/sin^ - D/2f + (/cos^) 2 = J I 1 - /£>sm$ + £> a /4 . 
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For a given delayed signal pair in the dual delay-line 342 of FIG. 1 1 to become 
equalized under this approach, the factors a, (m) and a,. i+ , (m) must satisfy equation (A5) as 
follows: 

lpja i (m)=lp /l la / _ 1 . +l (m). (A5) 

Substituting equation (A2) into equation (A5), equation (A6) results as follows: 

r L _ g, (m) 

r K a,_ M (.m)' (A6) 
By defining the value of a, (m) in accordance with equation (A7) as follows: 



a,, (m) = K^l 2 +/Dsin0 f +D74 , (A7) 

where, AT is in units of inverse length and is chosen to provide a convenient amplitude level, 
the value of aj. i+ i (m) is given by equation (A8) as follows: 



(AoJ 

where, the relation sin^ i+ /=-sinft can be obtained by substituting I-i+l into i in equation (3). 
By substituting equations (A7) and (A8) into equation (A6), it may be verified that the values 
assigned to ^(m) in equation (A7) satisfy the condition established by equation (A6). 

After obtaining the equalization factors a, (m) in accordance with this embodiment, 
minor adjustments are preferably made to calibrate for asymmetries in the sensor 
arrangement and other departures from the ideal case such as those that might result from 
media absorption of acoustic energy, an acoustic source geometry other than a point source, 
and dependence of amplitude decline on parameters other than distance. 

After equalization by factors a, (m) with multipliers 346, the in-phase desired signal 
component is generally the same in the left and right channels of the dual delay lines 342 for 
the delayed signal pairs corresponding to / = U^\ = s, and the in-phase noise signal 
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0 



5 



component is generally the same in the left and right channels of the dual delay lines 342 for 
the delayed signal pairs corresponding to / = i no ise = g for the case of a single, predominant 
interfering noise source. The desired signal at i=s may be expressed as S„ (m) = A s exp[f(co m t 
+ 0 S )]; and the interfering signal at i=^may be expressed as G n (m) = A g exp[J((o m t+0g)], 
where 0 S and & g denote initial phases. Based on these models, equalized signals 
a,{mycj°(m) for the left channel and a w (m)Xj 0 (m) for the right channel at any arbitrary 
point i (except i = s) along dual delay lines 342 may be expressed in equations (5) and (6) as 
follows: 



a,(m)X^(m) = A x exp7l© m (f + r / -T i ) + ^]+A, cxp j[co m (t + r t -*,) + ♦,]. 



(5) 



A, exp j[o m (r + t,_, +1 - t^ +1 ) + $,]+ A t exp j[o) m (r + T,_ t+l - t,_, +1 ) + <t> t 1 • 



(6) 



where equations (7) and (8) further define certain terms of equations (5) and (6) as follows: 

i 

- (7) 
20 X £ (m) = X u (m) expC-jlTt/^ 



X% (m) = X*. (m) exp(-j27c/ m xi. /+/ ; (8) 



Each signal pair a.im^u/'^m) and ai. i+ x{m)X Rn (i) (m) is input to a corresponding 
operation stage 354 of a corresponding one of operation arrays 352 for all m\ where each 
operator array 352 corresponds to a different value of m as in the case of dual delay lines 342. 
For a given operation array 352, operation stages 354 corresponding to each value of/, 
30 except /=s, perform the operation defined by equation (9) as follows: 
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(a, /a,)exp[y<i> m (T, -T,)]-(a,_i«.i 



for i * j. 



(9) 



If the value of the denominator in equation (9) is too small, a small positive constant e is 
added to the denominator to limit the magnitude of the output signal X n (i \m). No operation is 
performed by the operation stage 354 on the signal pair corresponding to i=s for all m (all 
operation arrays 352 of signal operator 350). 

Equation (9) is comparable to the expressions CE1 and CE2 of system 10; however, 
equation (9) includes equalization elements a^m) and is organized into a single expression. 
With the outputs from operation array 352, the simultaneous localization and identification of 
the spectral content of the desired signal maybe performed with system 310. Localization 
and extraction with system 310 are further described by the signal flow diagram of Fig. 13 
and the following mathematical model. By substituting equations (5) and (6) into equation 
(9), equation (10) results as follows: 



where equation (11) further defines: 

(O/ x = (<*l /g t )e*p[7<P«(T y -T f )3-(a / _ f>l /<*;-,+, )CXp[y<Q.,(T w -T,_^,)] 

V ' m (a, / a, ) expCyo). (t, - t< )] - (a,_^, / a,_, +l ) exp[y<u„ (t,_, +1 - t,_,<., )] ' 



1*5 



(11) 



By applying equation (2) to equation (11), equation (12) results as follows: 



(i) _ («, /« t )exp[y<a m (T, - x, )]-(«,.,», /«,.,», )cxp[-ya>,(T f -t,)] 
U '' (m) ~ (a i /a / )exp[y< i ) m (T / -T < )]-(a / .,. +I /a,_^)exp[-ya) m (T,-T 1 )] ' 



i * s. (12) 



The energy of the signal X„ (i) (m) is expressed in equation (13) as follows: 
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□ 

U! 
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5 A signal vector may be defined: 

x (0 =(X I (0 (1),X I (|) (2) X™iM).X™(l) X<°(Af) X«\l) X«\M)) T , 

1=1,...,/, 

10 where, T denotes transposition. The energy flx^ || \ of the vector is given by equation 
(14) as follows: 



* (14 > 

Equation (14) is a double summation over time and frequency that approximates a double 
integration in a continuous time domain representation. 
Further defining the following vectors: 



s = (S x (1)^(2) S x (M),S 2 (l) S 2 (M) S„(l),...,S„(Af)) r ,and 

« C ° =(G,(l)t><^ G 2 (M)x£>(M),... t 

O n (1) G N ( M )vj'l ( M )) r . where i = 1 /, 
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the energy of vectors s and ^ are respectively defined by equations (15) and (16) as follows: 

M.I JK-1 

(15) 
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lg <0 E =XX|G-(")O"0f • (i6) 

For a desired signal that is independent of the interfering source, the vectors s and 
are orthogonal. In accordance with the Theorem of Pythagoras, equation (17) results as 
follows: 

Because §gf° \\l > 0, equation (1 8) results as follows: 



(18) 

The equality in equation (1 8) is satisfied only when || g® || \ = 0, which happens if either of 
the following two conditions are met: (a) G„(m) = 0, i.e., the noise source is silent - in which 
case there is no need for doing localization of the noise source and noise cancellation; and (b) 
v sg (i) (m) = 0; where equation (12) indicates that this second condition arises for i = g = /'noise- 
Therefore, || jc* 0 || 1 has its minimum at i=g = /„ 0 ise, which according to equation (18) is 
|| s || i • Equation (19) further describes this condition as follows: 



(19) 



Thus, the localization procedure includes finding the position incise along the operation 
array 352 for each of the delay lines 342 that produces the minimum value of || I • Once 
the location inoise along the dual delay line 342 is determined, the azimuth position of the 
noise source may be determined with equation (3). The estimated noise location i„ 0 ise may be 
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utilized for noise cancellation or extraction of the desired signal as further described 
hereinafter. Indeed, operation stages 354 for all m corresponding to / = / no i S e provide the 
spectral components of the desired signal as given by equation (20): 

S n (m) = X"-' (m) = S m (m) + G. (m) - v?*-> (m) - 5„ (m) . 

(20) 

Localization operator 360 embodies the localization technique of system 310. Fig. 13 
further depicts operator 360 with coupled pairs of summation operators 362 and 364 for each 
value of integer index /; where /=1,...,I. Collectively, summation operators 362 and 364 
perform the operation corresponding to equation (14) to generate || xf° \\ \ for each value of/. 
For each transform time frame n, the summation operators 362 each receive X n (0 (l) through 
X n (0 (M) inputs from operation stages 354 corresponding to their value of / and sums over 
frequencies /w=l through m=M. For the illustrated example, the upper summation operator 
362 corresponds to z=l and receives signals X n (1) (l) through X„ (1) (M) for summation; and the 
lower summation operator 362 corresponds to /=I and receives signals X n (1) (l) through 
X n (I) (M) for summation. 

Each summation operator 364 receives the results for each transform time frame n 
from the summation operator 362 corresponding to the same value of i and accumulates a 
sum of the results over time corresponding to n=l through «=N transform time frames; where 
N is a quantity of time frames empirically determined to be suitable for localization. For the 
illustrated example, the upper summation operator 364 corresponds to i=l and sums the 
results from the upper summation operator 362 over N samples; and the lower summation 
operator 364 corresponds to i=I and sums the results from the lower summation operator 362 
over N samples. 

The I number of values of || x (i) || \ resulting from the I number of summation 
operators 364 are received by stage 366. Stage 366 compares the I number of || xf i} || \ values 
to determine the value of i corresponding to the minimum \xf i} || \ . This value of i is output 
by stage 366 as i = g = / n0 i S c . 

Referring back to Fig. 10, post-localization processing by system 3 10 is further 
described. When equation (9) is applied to the pair inputs of delay lines 342 at i=g, it 
corresponds to the position of the off-axis noise source and equation (20) shows it provides 
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an approximation of thcdesired signal S„(m). To extract signal S„(/n), the index value i—g is 
sent by stage 366 of localization unit 360 to extraction operator 380. In response to g t 
extraction operator 380 routes the outputs Xn^l) through X n (8) (M) = S„(/n) to Inverse 
Fourier Transform (EFT) stage 82 operatively coupled thereto. For this purpose, extraction 
5 operator 380 preferably includes a multiplexer or matrix switch that has IxM complex inputs 
and M complex outputs; where a different set of M inputs is routed to the outputs for each 
different value of the index I in response to the output from stage 366 of localization operator 
360. 

Stage 82 converts the M spectral components received from extraction unit 380 to 
10 transform the spectral approximation of the desired signal, S„(/n), from the frequency domain 
to the time domain as represented by signal S„(£). Stage 82 is operatively coupled to digital- 
Si to-analog (D/A) converter 84. D/A converter 84 receives signal S n (k) for conversion from a 

'■it 

0J discrete form to an analog form represented by S n (t). Signal S„(r) is input to output device 90 
jl j to provide an auditory representation of the desired signal or other indicia as would occur to 

those skilled in the art. Stage 82, converter 84, and device 90 are further described in 
|j| connection with system 10. 

~* Another form of expression of equation (9) is given by equation (21) as follows: 

y| 

§0 X™ (m) = (m) X £ (m) + (m) X% (m) . 

(21) 

The terms and are equivalent to beamforming weights for the left and right channels, 
respectively. As a result, the operation of equation (9) may be equivalently modeled as a 

25 beamforming procedure that places a null at the location corresponding to the predominant 
noise source, while steering to the desired output signal S„(/)- 

Fig. 14 depicts system 410 of still another embodiment of the present invention. 
System 410 is depicted with several reference numerals that are the same as those used in 
connection with systems 10 and 310 and are intended to designate like features. A number of 

30 acoustic sources 412, 414, 416, 418 are depicted in Fig. 14 within the reception range of 
acoustic sensors 22, 24 of system 410. The positions of sources 412, 414, 416, 418 are also 
represented by the azimuth angles relative to axis AZ that are designated with reference 
numerals 412a, 414a, 416a, 418a. As depicted, angles 412a, 414a, 416a, 418a correspond to 
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about 0°, +20°, +75°, and -75°, respectively. Sensors 22, 24 are operatively coupled to signal 
processor 430 with axis AZ extending about midway therebetween. Processor 430 receives 
input signals x^l), x Rn (t) from sensors 22, 24 corresponding to left channel L and right 
channel R as described in connection with system 310. Processor 430 processes signals 
xin(0> x Rn(t) and provides corresponding output signals to output devices 90, 490 operatively 
coupled thereto. 

Referring additionally to the signal flow diagram of Fig. 15, selected features of 
system 410 are further illustrated. System 410 includes D/A converters 34a, 34b and DFT 
stages 36a, 36b to provide the same left and right channel processing as described in 
connection with system 310. System 410 includes delay operator 340 and signal operator 
350 as described for system 310; however it is preferred that equalization factors a t {m) (z=l, 
I) be set to unity for the localization processes associated with localization operator 460 
of system 410. Furthermore, localization operator 460 of system 410 directly receives the 
output signals of delay operator 340 instead of the output signals of signal operator 350, 
unlike system 310. 

The localization technique embodied in operator 460 begins by establishing two- 
dimensional (2-D) plots of coincidence loci in terms of frequency versus azimuth position. 
The coincidence points of each loci represent a minimum difference between the left and 
right channels for each frequency as indexed by m. This minimum difference may be 
expressed as the minimum magnitude difference 8X„ (i) (m) between the frequency domain 
representations X Lp (i) (m) and^/'^m), at each discrete frequency m, yielding M/2 potentially 
different loci. If the acoustic sources are spatially coherent, then these loci will be the same 
across all frequencies. This operation is described in equations (22)-(25) as follows: 



z„(m) = argmin{^°(m)} , m=l A//2. 



SX^(m) =|*£(m) - X«2(m)|. j*l /; m=l M/2, (23) 



= X Ut {m)^ V {-j27W t m/ M) , /=! /; m=1> M/2 

' (24) 
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X^(m) = (m) exp(- j^/rr^m / M ) . z=l /; m=l. .... Af/2. (25) 



If the amplitudes of the left and right channels are generally the same at a given 
position along dual delay lines 342 of system 410 as indexed by i, then the values of hX n m (m) 
for the corresponding value of / is minimized, if not essentially zero. It is noted that, despite 
inter-sensor intensity differences, equalization factors a t {m) (i— 1, I) should be maintained 
close to unity for the purpose of coincidence detection; otherwise, the minimal &X n (i) (m) will 
not correspond to the in-phase (coincidence) locations. 

An alternative approach may be based on identifying coincidence loci from the phase 
difference. For this phase difference approach, the minimum of the phase difference between 
the left and right channel signals at positions along the dual delay lines 342, as indexed by /, 
are located as described by the following equations (26) and (27): 



i n (m) = argmin{<5^ /) (m)}, m=l Af/2. (26) 

«r ) (m)=|lm[X^ ) (m)^ ) (m) t i. i=l /; m=l Af/2, 

(27) 

where, Im[»] denotes the imaginary part of the argument, and the superscript f denotes a 
complex conjugate. Since the phase difference technique detects the minimum angle 
between two complex vectors, there is also no need to compensate for the inter-sensor 
intensity difference. 

While either the magnitude or phase difference approach may be effective without 
further processing to localize a single source, multiple sources often emit spectrally 
overlapping signals that lead to coincidence loci which correspond to nonexistent or phantom 
sources (e.g., at the midpoint between two equal intensity sources at the same frequency). 
Fig. 17 illustrates a 2-D coincidence plot 500 in terms of frequency in Hertz (Hz) along the 
vertical axis and azimuth position in degrees along the horizontal axis. Plot 500 indicates two 
sources corresponding to the generally vertically aligned locus 512a at about -20 degrees 
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and the vertically aligned locus 512b at about + 40 degrees. Plot 500 also includes 
misidentified or phantom source points 514a, 514b, 514c, 514d, 514e at other azimuths 
positions that correspond to frequencies where both sources have significant energy. For 
more than two differently located competing acoustic sources, an even more complex plot 
5 generally results. 

To reduce the occurrence of phantom information in the 2-D coincidence plot data, 
localization operator 460 integrates over time and frequency. When the signals are not 
correlated at each frequency, the mutual interference between the signals can be gradually 
attenuated by the temporal integration. This approach averages the locations of the 
10 - coincidences, not the value of the function used to determine the minima, which is equivalent 
to applying a Kronecker delta function, 5(/-/ n (m)) to 8X n (0 (m) and averaging the b(i-i n (m)) 
over time. In turn, the coincidence loci corresponding to the true position of the sources are 
Ql enhanced. Integration over time applies a forgetting average to the 2-D coincidence plots 

i~ a 

'{\\ acquired over a predetermined set of transform time frames from n =1 N; and is expressed 
f\*5 by the summation approximation of equation (28) as follows: 



□ where, 0 <p<l is a weighting coefficient which exponentially de-emphasizes (or forgets) the 
effect of previous coincidence results, 5(») is the Kronecker delta function, 0, represents the 

20 position along the dual delay-lines 342 corresponding to spatial azimuth 9, [equation (2)], 
and Prefers to the current time frame. To reduce the cluttering effect due to instantaneous 
interactions of the acoustic sources, the results of equation (28) are tested in accordance with 
the relationship defined by equation (29) as follows: 



P»& i .m) = 2,fi N -"S(i-i l .(m)), f=l, . 



/; m=l 



f » • • » 



Af/2, 



(28) 
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otherwise. 



(29) 



where T > 0, is an empirically determined threshold. While this approach assumes the inter- 
sensor delays are independent of frequency, it has been found that departures from this 
assumption may generally be considered negligible. 



By integrating the coincidence plots across frequency, a more robust and reliable 
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indication of the locations of sources in space is obtained. Integration of P„(Qi,m) over 
frequency produces a localization pattern which is a function of azimuth. Two techniques to 
estimate the true position of the acoustic sources may be utilized. The first estimation 
technique is solely based on the straight vertical traces across frequency that correspond to 
different azimuths. For this technique, Q d denotes the azimuth with which the integration is 
associated, such that 0 rf = 0/, and results in the summation over frequency of equation (30) as 
follows: 



h h (e d ) = X p* <9# - m > • d = l L 

(30) 

where, equation (30) approximates integration over time. 

The peaks in H n (Q d ) represent the source azimuth positions. If there are Q sources, Q 
peaks in H^Q d ) may generally be expected. When compared with the patterns b(i-i n (m)) at 
each frequency, not only is the accuracy of localization enhanced when more than one sound 
source is present, but also almost immediate localization of multiple sources for the current 
frame is possible. Furthermore, although a dominant source usually has a higher peak in 
Hs($d) than do weaker sources, the height of a peak in Hj^QJ) only indirectly reflects the 
energy of the sound source. Rather, the height is influenced by several factors such as the 
energy of the signal component corresponding to Q d relative to the energy of the other signal 
components for each frequency band, the number of frequency bands, and the duration over 
which the signal is dominant. In fact, each frequency is weighted equally in equation (28). 
As a result, masking of weaker sources by a dominant source is reduced. In contrast, existing 
time-domain cross-correlation methods incorporate the signal intensity, more heavily biasing 
sensitivity to the dominant source. 

Notably, the interaural time difference is ambiguous for high frequency sounds where 
the acoustic wavelengths are less than the separation distance D between sensors 22, 24. 
This ambiguity arises from the occurrence of phase multiples above this inter-sensor distance 
related frequency, such that a particular phase difference A<|> cannot be distinguished from A(J> 
+27i. As a result, there is not a one-to-one relationship of position versus frequency above a 
certain frequency. Thus, in addition to the primary vertical trace corresponding to Q<j = 0y. , 
there are also secondary relationships that characterize the variation of position with 
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frequency for each ambiguous phase multiple. These secondary relationships are taken into 
account for the second estimation technique for integrating over frequency. Equation (31) 
provides a means to determine a predictive coincidence pattern for a given azimuth that 
accounts for these secondary relationships as follows: 

s\ ' ft y mji 

sin ft - s\nd d = — — — , 

rro^ 

(31) 

where the parameter y m j is an integer, and each value of y m ,</ defines a contour in the pattern 
Pifflim). The primary relationship is associated with y mi d =0. For a specific 0</, the range of 
valid y m j is given by equation (32) as follows: 



-ITD^/^a + sinej^y^ < ITO w / m (l-sin3,) 



(32) 



The graph 600 of Fig. 18 illustrates a number of representative coincidence patterns 
612, 614, 616, 618 determined in accordance with equations (31) and (32); where the vertical 
axis represents frequency in Hz and the horizontal axis represents azimuth position in 
degrees. Pattern 612 corresponds to the azimuth position of 0°. Pattern 612 has a primary 
relationship corresponding to the generally straight, solid vertical line 612a and a number of 
secondary relationships corresponding to curved solid line segments 612b. Similarly, 
patterns 614, 616, 618 correspond to azimuth positions of -75°, 20°, and 75° and have primary 
relationships shown as straight vertical lines 614a, 616a, 618a and secondary relationships 
shown as curved line segments 614b, 616b, 618b, in correspondingly different broken line 
formats. In general, the vertical lines are designated primary contours and the curved line 
segments are designated secondary contours. Coincidence patterns for other azimuth 
positions may be determined with equations (31) and (32) as would occur to those skilled in 
the art. 

Notably, the existence of these ambiguities in Ptffiijri) may generate artifactual peaks 
in Hi^dJ) after integration along 0</ = 9/. Superposition of the curved traces corresponding to 
several sources may induce a noisier H^d) term. When far away from the peaks of any real 
sources, the artifact peaks may erroneously indicate the detection of nonexistent sources; 
however, when close to the peaks corresponding to true sources, they may affect both the 
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detection and localization of peaks of real sources in H#(Qd). When it is desired to reduce the 
adverse impact of phase ambiguity, localization may take into account the secondary 
relationships in addition to the primary relationship for each given azimuth position. Thus, a 
coincidence pattern for each azimuthal direction Qd (d=\, . . ., 7) of interest may be determined 
and plotted that may be utilized as a "stencil" window having a shape defined by P#(Qim) 
(z=l , m-\ , . . . , M). In other words, each stencil is a predictive pattern of the 
coincidence points attributable to an acoustic source at the azimuth position of the primary 
contour, including phantom loci corresponding to other azimuth positions as a factor of 
frequency. The stencil pattern may be used to filter the data at different values of m. 

By employing the equation (32), the integration approximation of equation (30) is 
modified as reflected in the following equation (33): 



H ' w = ^? p » [sin " ( rrfe +slne ' ) '' nl ' 



(33) 



where A(QJ) denotes the number of points involved in the summation. Notably, equation (30) 
is a special case of equation (33) corresponding to y mt d =0. Thus, equation (33) is used in 
place of equation (30) when the second technique of integration over frequency is desired. 

As shown in equation (2), both variables 9/ and x,- are equivalent and represent, the 
position in the dual delay-line. The difference between these variables is that 8,- indicates 
location along the dual delay-line by using its corresponding spatial azimuth, whereas x,- 
denotes location by using the corresponding time-delay unit of value x,- . Therefore, the 
stencil pattern becomes much simpler if the stencil filter function is expressed with X/ as 
defined in the following equation (34): 



*Y mx 



(34) 



where, relates to 9</ through equation (4). For a specific x d , the range of valid y m .</is given 
by equation (35) as follows: 
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-(m^ /2 + x d )f m ^Y mJ ^{TTD^/2^r d )f M9 y mM is an integer. 

(35) 



Changing value of i d only shifts the coincidence pattern (or stencil pattern) along the x/-axis 
without changing its shape. The approach characterized by equations (34) and (35) may be 
utilized as an alternative to separate patterns for each azimuth position of interest; however, 
because the scaling of the delay units x,- is uniform along the dual delay-line, azimuthal 
partitioning by the dual delay-line is not uniform, with the regions close to the median plane 
having higher azimuthal resolution. On the other hand, in order to obtain an equivalent 
resolution in azimuth, using a uniform x, would require a much larger / of delay units than 
using a uniform 0/. 

The signal flow diagram of Fig. 16 further illustrates selected details concerning 
localization operator 460. With equalization factors a,<iw) set to unity, the delayed signal of 
pairs of delay stages 344 are sent to coincidence detection operators 462 for each frequency 
indexed to m to determine the coincidence points. Detection operators 462 determine the 
minima in accordance with equation (22) or (26). Each coincidence detection operator 462 
sends the results i n {m) to a corresponding pattern generator 464 for the given m. Generators 
464 build a 2-D coincidence plot for each frequency indexed to m and pass the results to a 
corresponding summation operator 466 to perform the operation expressed in equation (28) 
for that given frequency. Summation operators 466 approximate integration over time. In 
Fig. 16, only operators 462, 464, and 466 corresponding to m =1 and m =M are illustrated to 
preserve clarity, with those corresponding to m =2 through m = M-l being represented by 
ellipses. 

Summation operators 466 pass results to summation operator 468 to approximate 
integration over frequency. Operators 468 may be configured in accordance with equation 
(30) if artifacts resulting from the secondary relationships at high frequencies are not present 
or may be ignored. Alternatively, stencil filtering with predictive coincidence patterns that 
include the secondary relationships may be performed by applying equation (33) with 
summation operator 468. 

Referring back to Fig. 15, operator 468 outputs H N (6d) to output device 490 to map 
corresponding acoustic source positional information. Device 490 preferably includes a 

34 



22010-148/67767 




display or printer capable of providing a map representative of the spatial arrangement of the 
acoustic sources relative to the predetermined azimuth positions. In addition, the acoustic 
sources may be localized and tracked dynamically as they move in space. Movement 
trajectories may be estimated from the sets of locations 8(/-/ n (m)) computed at each sample 
window /I. For other embodiments incorporating system 410 into a small portable unit, such 
as a hearing aid, output device 490 is preferably not included. In still other embodiments, 
output device 90 may not be included. 

The localization techniques of localization operator 460 are particularly suited to 
localize more than two acoustic sources of comparable sound pressure levels and frequency 
ranges, and need not specify an on-axis desired source. As such, the localization techniques 
of system 410 provide independent capabilities to localize and map more than two acoustic 
sources relative to a number of positions as defined with respect to sensors 22, 24. However, 
in other embodiments, the localization capability of localization operator 460 may also be 
utilized in conjunction with a designated reference source to perform extraction and noise 
suppression. Indeed, extraction operator 480 of the illustrated embodiment incorporates such 
features as more fully described hereinafter. 

Existing systems based on a two sensor detection arrangement generally only attempt 
to suppress noise attributed to the most dominant interfering source through beamforming. 
Unfortunately, this approach is of limited value when there are a number of comparable 
interfering sources at proximal locations. 

It has been discovered that by suppressing one or more different frequency 
components in each of a plurality of interfering sources after localization, it is possible to 
reduce the interference from the noise sources in complex acoustic environments, such as in 
the case of multi-talkers, in spite of the temporal and frequency overlaps between talkers. 
Although a given frequency component or set of components may only be suppressed in one 
of the interfering sources for a given time frame, the dynamic allocation of suppression of 
each of the frequencies among the localized interfering acoustic sources generally, results in 
better intelligibility of the desired signal than is possible by simply nulling only the most 
offensive source at all frequencies. 

Extraction operator 480 provides one implementation of this approach by utilizing 
localization information from localization operator 460 to identify Q interfering noise sources 
corresponding to positions other than i = s. The positions of the Q noise sources are 
represented by i=noisel, noise2 9 ..., noiseQ. Notably, operator 480 receives the outputs of 
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signal operator 350 as described in connection with system 310, that presents corresponding 
signals xj" ™* cl > (*,), Xn < i=noi5e2 > (/n), . . ., Xf-***® (m) for each frequency m. These signals 
include a component of the desired signal at frequency m as well as components from sources 
other than the one to be canceled. For the purpose of extraction and suppression, the 
equalization factors a t {m) need not be set to unity once localization has taken place. To 
determine which frequency component or set of components to suppress in a particular noise 
source, the amplitudes ofX n (i ^ oiscl) {m) J X n (i ^ 6isc2) {m\ ...,X n (i=noiseQ) (m) are calculated and 
compared. The minimum X n (inoise) (m), is taken as output S n (m) as defined by the following 
equation (36): 

S n (m)=X n (inoise) (m), (36) 

where, X„° noise) (ni) satisfies the condition expressed by equation (37) as follows: 

| X n (inoise) (m) | = min{ | xf™^ (m) | , | xJ lmMJ (m) | | X n (i = noiseQ > (m) | , 

\a s {m)X Ln (s) (m)\y j (37) 

for each value of m. It should be noted that, in equation (37), the original signal a 5 (m) 
XiJ s) (m) is included The resulting beam pattern may at times amplify other less intense 
noise sources. When the amount of noise amplification is larger than the amount of 
cancellation of the most intense noise source, further conditions may be included in operator 
480 to prevent changing the input signal for that frequency at that moment. 

Processors 30, 330, 430 include one or more components that embody the 
corresponding algorithms, stages, operators, converters, generators, arrays, procedures, 
processes, and techniques described in the respective equations and signal flow diagrams in 
software, hardware, or both utilizing techniques known to those skilled in the art. Processors 
30, 330, 430 may be of any type as would occur to those skilled in the art; however, it is 
preferred that processors 30, 330, 430 each be based on a solid-state, integrated digital signal 
processor with dedicated hardware to perform the necessary operations with a minimum of 
other components. 

Systems 310, 410 may be sized and adapted for application as a hearing aide of the 
type described in connection with Fig. 4A. In a further hearing aid embodiment, sensors 
application 22, 24 are sized and shaped to fit in the pinnae of a listener, and the processor 
algorithms are adjusted to account for shadowing caused by the head and torso. This 
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adjustment may be provided by deriving a Head-Related-Transfer-Function (HRTF) specific 

to the listener or from a population average using techniques known to those skilled in the art. 

This function is then used to provide appropriate weightings of the dual delay stage output 

signals that compensate for shadowing. 
5 In yet another embodiment, system 310, 410 are adapted to voice recognition systems 

of the type described in connection with Fig. 4B. In still other embodiments, systems 310, 

410 may be utilized in sound source mapping applications, or as would otherwise occur to 

those skilled in the art. 

It is contemplated that various signal flow operators, converters, functional blocks, 
10 generators, units, stages, processes, and techniques maybe altered, rearranged, substituted, 

deleted, duplicated, combined or added as would occur to those skilled in the art without 
,|i departing from the spirit of the present inventions. In one further embodiment, a signal 

processing system according to the present invention includes a first sensor configured to 
U j provide a first signal corresponding to an acoustic excitation; where this excitation includes a 
|j i5 first acoustic signal from a first source and a second acoustic signal from a second source 
" displaced from the first source. The system also includes a second sensor displaced from the 
CI first sensor that is configured to. provide a second signal corresponding to the excitation, 
j^b Further included is a processor responsive to the first and second sensor signals that has 
%* % means for generating a desired signal with a spectrum representative of the first acoustic 

'•S3 ff 

1-20 signal. This means includes a first delay line having a number of first taps to provide a 

number of delayed first signals and a second delay line having a number of second taps to 
provide a number of delayed second signals. The system also includes output means for 
generating a sensory output representative of the desired signal. In another embodiment, a 
method of signal processing includes detecting an acoustic excitation at both a first location 

25 to provide a corresponding first signal and at a second location to provide a corresponding 
second signal. The excitation is a composite of a desired acoustic signal from a first source 
and an interfering acoustic signal from a second source that is spaced apart from the first 
source. This method also includes spatially localizing the second source relative to the first 
source as a function of the first and second signals and generating a characteristic signal 

30 representative of the desired acoustic signal during performance of this localization. 
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EXPERIMENTAL SECTION 



The following experimental results are provided as merely illustrative examples to 
enhance understanding of the present invention, and should not be construed to restrict or 
limit the scope of the present invention. 

5 

EXAMPLE ONE 

A Sun Sparc -20 workstation was programmed to emulate the signal extraction process 
of the present invention. One loudspeaker (LI) was used to emit a speech signal and another 
loudspeaker (L2) was used to emit babble noise in a semi-anechoic room. Two microphones 
10 of a conventional type were positioned in the room and operatively coupled to the 

workstation. The microphones had an inter-microphone distance of about 15 centimeters 
□ and were positioned about 3 feet from LI. LI was aligned with the midpoint between the 
q j microphones to define a zero degree azimuth. L2 was placed at different azimuths relative to 
;~f LI approximately equidistant to the midpoint between LI and L2. 

11.15 Referring to FIG. 5, a clean speech of a sentence about two seconds long is depicted, 

hj 

fjj emanating from LI without interference from L2. FIG. 6 depicts a composite signal from LI 
;L, and L2. The composite signal includes babble noise from L2 combined with the speech 
U J signal depicted in FIG. 5. The babble noise and speech signal are of generally equal intensity 
\j i (OdB) with L2 placed at a 60 degree azimuth relative to LI. FIG. 7 depicts the signal 
l ~io recovered from the composite signal of FIG. 6. This signal is nearly the same as the signal of 
FIG. 5. 

FIG. 8 depicts another composite signal where the babble noise is 3 OdB more intense 
than the desired signal of FIG. 5. Furthermore, L2 is placed at only a 2 degree azimuth 
relative to LI. FIG. 9 depicts the signal recovered from the composite signal of FIG. 8, 
25 providing a clearly intelligible representation of the signal of FIG. 5 despite the greater 
intensity of the babble noise from L2 and the nearby location. 



EXAMPLE TWO 

30 Experiments corresponding to system 410 were conducted with two groups having four 

talkers (2 male, 2 female) in each group. Five different tests were conducted for each group 
with different spatial configurations of the sources in each test. The four talkers were 
arranged in correspondence with sources 412, 414, 416, 418 of Fig. 14 with different values 
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for angles 412a, 414a, 416a, and 418a in each test. The illustration in Fig. 14 most closely 
corresponds to the first test with angle 418a being -75 degrees , angle 412a being 0 degrees, 
angle 414a being +20 degrees, and angle 416a being +75 degrees. The coincident patterns 
612, 614, 616, and 618 of Fig. 18 also correspond to the azimuth positions of -75 degrees, 0 
5 degrees, +20 degrees, and +75 degrees. 

The experimental set-up for the tests utilized two microphones for sensors 22, 24 with 
an inter-microphone distance of about 144mm. No diffraction or shadowing effect existed 
between the two microphones, and the inter-microphone intensity difference was set to zero 
for the tests. The signals were low-pass filtered at 6 kHz and sampled at a 12.8-kHz rate with 
10 16-bit quantization. A Wintel-based computer was programmed to receive the quantized 
signals for processing in accordance with the present invention and output the test results 
O described hereinafter. In the short-term spectral analysis, a 20-ms segment of signal was 

i.f J 

£i weighted by a Hamming window and then padded with zeros to 2048 points for DFT, and 

thus the frequency resolution was about 6Hz. The values of the time delay units x, (/=1, I) 

Ill 15 were determined such that the azimuth resolution of the dual delay-line was 0.5° uniformly, 

IjJ 

ljj namely 7=361. The dual delay-line used in the tests was azimuth-uniform. The coincidence 
t % detection method was based on minimum magnitude differences. 

Ml Each of the five tests consisted of four subtests in which a different talker was taken as 

w j the desired source. To test the system performance under the most difficult experimental 
^ho constraint, the speech materials (four equally- intense spondaic words) were intentionally 

aligned temporally. The speech material was presented in free-field. The localization of the 
talkers was done using both the equation (30) and equation (33) techniques. 

The system performance was evaluated using an objective intelligibility-weighted 
measure, as proposed in Peterson, P.M., "Adaptive array processing for multiple microphone 
25 hearing aids," Ph.D. Dissertation . Dept. Elect. Eng. and Comp. Sci., MIT; Res. Lab. Elect. 
Tech. Rept. 541, MIT, Cambridge, MA (1989). and described in detail in Liu, C. and 
Sideman, S., "Simulation of fixed microphone arrays for directional hearing aids ," J. Acoust. 
Soc. Am. 100, 848-856 (1996). Specifically, intelligibility-weighted signal cancellation, 
intelligibility-weighted noise cancellation, and net intelligibility-weighted gain were used. 
30 The experimental results are presented in Tables I, II, III, and IV of FIGs. 19-22, 

respectively. The five tests described in Table I of FIG. 19 approximate integration over 
frequency by utilizing equation (30); and includes two male speakers Ml, M2 and two 
female speakers Fl, F2. The five tests described in Table II of FIG. 20 are the same as Table 
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I, except that integration over frequency was approximated by equation (33). The five tests 
described in Table III of FIG. 21 approximate integration over frequency by utilizing 
equation (30); and includes two different male speakers M3, M4 and two different female 
speakers F3, F4. The five tests described in Table IV of FIG. 22 are the same as Table III, 
5 except that integration over frequency was approximated by equation (33). 

For each test, the data was arranged in a matrix with the numbers on the diagonal line 
representing the degree of noise cancellation in dB of the desired source (ideally 0 dB) and 
the numbers elsewhere representing the degree of noise cancellation for each noise source. 
The next to the last column shows a degree of cancellation of all the noise sources lumped 
10 together, while the last column gives the net intelligibility- weighted improvement (which 
considers both noise cancellation and loss in the desired signal). 

£l The results generally show cancellation in the intelligibility-weighted measure in a 

4 J 

□1 range of about 3-1 1 dB, while degradation of the desired source was generally less than 
jlj about 0.1 dB). The total noise cancellation was in the range of about 8-12 dB. Comparison 
'"H 5 of the various Tables suggests very little dependence on the talker or the speech materials 
\j! used in the tests. Similar results were obtained from six-talker experiments. Generally, a 
jL j 7-10 dB enhancement in the intelligibility- weighted signal-to-noise ratio resulted when there 
y| were six equally loud, temporally aligned speech sounds originating from six different 
ijj loudspeakers. 

All publications and patent applications cited in this specification are herein 
incorporated by reference as if each individual publication or patent application were 
specifically and individually indicated to be incorporated by reference, including, but not 
limited to commonly owned U.S. Patent Application Serial No. 08/666,757 filed on 19 June 
1996 and U. S. Patent Application Serial No. 08/193,158 filed on 16 November 1998. 
25 Further, any theory, mechanism of operation, proof, or finding stated herein is meant to 
further enhance understanding of the present invention and is not intended to make the 
present invention or the scope of the invention as defined by the following claims in any way 
dependent upon such theory, mechanism of operation, proof, or finding. While the invention 
has been illustrated and described in detail in the drawings and foregoing description, the 
30 same is to be considered as illustrative and not restrictive in character, it being understood 
that only selected embodiments have been shown and described and that all changes, 
modifications, and equivalents that come within the spirit of the invention defined by the 
following claims are desired to be protected. 
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